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Tejtelroan 11 s Scherae i 

Warren Teitelmen presented a. navel scheme fox r&al time charsc- 
ter recognition in his Piaster's thesis submitted In June of 19G3. A 
rectangle,. in which the character is to be drawn, is divided into two part a t 
one shaded, and the other unshaded. Using this division, a computer converts 
characters into ternary vectors in the £oLLoving, way. If the pen enters the 
shaded region, a 1 is added to the vectot. When the unshaded region is enter¬ 
ed* a 0 is appended r Finally* if the pen is lifted from the writing surface, 
a w is generated. Figure 1 illustrates the basic idea he used. Thus, with 
the shading shown, the character V is converted to 1 0 w 1 0.* A V drawn 
without lifting the pen would yield, a 10 1, AT gives 1 0 w 1, and so on. 

Notice that each character may yield several vectors, depending on 
just how it is drawn. The vectors to be stored, then, depend upon, the style 
of the user as well as the division of the rectangle into shaded and un¬ 
shaded. regions. 

In order Co conserve storage space and reduce search time* the charac¬ 
ter vectors of Teitelman 1 s scheme are stored in a tree-like structure like 
that shown in figure 2. Notice that the tree is essentially btnary--only 
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Figure 2. A vector storage tree. 


Since all figures arc completed by lifting the pen, it shall be the 
convention to drop the final w. 
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When the pen is first applied, 
it lands in the shaded region, 
thus- yielding an initial 1. 



The pen enters the tsnEhaded 
region, yielding a D, and then 
Leaves the writing surface t gen¬ 
erating e w+ 



The second stroke begins in the 
shaded r egion. 



1 0 w 1 0 


And the 
region, 
ter V. 


pen enters the unshaded 
completing the charac- 


Figure 1. Teitelman's Scheme 
















































two branches can grow from a single node* This follows Sines' a w may 
be- followed only by a 1 or Oj 0 only by 1 or w; and I only by 0 or w. 

Since several characters often etsd up at the same position on the 
tree, Teitelman elected to resolve ambiguity by combining the information 
fr mb severa 1 different region partitionings and their associated trees. 
The regions he used are shown in figure 3. His program takes a weighted 
look at the characters suggested by each tree. 



Figure 3- Tbitelman 1 s regions. 


Comment ; 

Teitelman claims his schema is superior to other character recognition 
methods because the program uses the order in which parts of the character 
ate drawn. While this is no doubt an important factor> I feel that some¬ 
thing should also be said about the program's sensitivity to connectedness. 
By this I mean that the program notes not only the presence of lines, but 
also the general areas they connect* For example,. Teitelman's program 
would respond to figure 4-a by indicating "there is ] a line connecting the 
upper left corner to the Lower left corner," whereas to figure 4-b the 
response would be quite different* Recognition schemes based on template 




Figure 4. A connected and an unconnected sample. 






















































































matching might equate the tup figures indicating M there are vertical lines 
in the upper left, middle left* and lower left corners." 

Finding Quad Fartit ions i 

Suppose the input rectangle is divided into nine "atomic" areas as 
in figure 5. If the part to be shaded is selected from combinations of 
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Figure 5.. Atomic areas. 

9 _ . * 

these atonic regions* one must suffer with 2 or 5L2 possibilities. The 
best combinations wlil obviously depend upon the character set to be recog¬ 
nized. But to- construct And evaluate ell possible partitions and assoc¬ 
iated trees would be exceedingly tedious. On the other hand, finding 
any sort of analytic method of optimum partitioning seems equally formida^ 
ble, if not impossible. Hence the problem of finding a partition that is 
in some sense good seems best approached heuristitally. 

only one 

Two simpLlfleations were made here. First* T agsyqie^part itioning 
Is avalLabir for th* recognition process, And second f that only one version 
of each character is to be learned and recognised. The results are 
sufficiently enlightening that these assumptions peeui justified. 


■* 

If duals formed by reversing shaded end unshaded regions are eliminated, 
theee are still 250 possibilities. 
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Input ; 

The characters were- converted to nwmber strings by hand. First the 
character set to be recognized was printed into rectangles. Then a 
clear plastic template bearing, the numbered atomic regions was placed on 
each character in turn. The comae of the pen through the atomic 
regions can then be easily read off and put into the machine in the form 
of a n\ETiber List, See figure 6. 



T converts to 2 9 t w 1 2 3 

Figure 6* Conversion of characters to number sequences. 
Atomic Regions ? 

kather than build the shaded regions from the atomic regions of 
figure 3 ea Teitelman did, 1 chose those of figure 7. 



Figure 7. Atomic areas used. 












I think this set is aonewh't tetter in aicOKibAtin.:, letters 
with diagonal stroke- saeh *s K, .., , 3, V, j t .c, and i* J.-ie 

gdvflflt^e 1 $ t.mt minor variants are ait split i.oto different 
sequertoes of input numbers, T-;is Ip desirable slnca it renders 
the asBUaintion of in y me verslm oer character somewhat less 
unrealistic. X, fir ezpmole, yields the sequence 1 ? 5 w 3 9 ? 
with my scheme, even if its position and slant are slightly al¬ 
tered* But usln^ Teltelmnn 1 s layout, the first strike :=lone 
yields four variants es shown in figure 3* Coupler with fiur 



Figure 3* Variants of first stroke of X* 
fir the second strike, X would have 16 possible sequences* 

Tree Srlterle. i 

As various partitimlnjs are proposed, there must be some 
measure of how good the trees they generate are* After son* 
thought, t^o qualities were selected as most important? first, 
the number of tranches should be saell; ^nd second, instances 
of more than one character st a node should be rare* Subroutine a 
were written to examine tree structures for these qualities end 
return with e ualr of numbers. The first number n?s just the 
number of branches. The second was tne probability of error. 
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given that all characters are equally likely a m3 that when several 
characters are found at a node, one is selected at random. 

The branch lumber, B, and the ptobability-of-errot number h F s 

are eoabined into an overall badness factor, W, by the following, formula: 

2 

U = B + c^P. This formula was arrived at by certain intuitton-guided 
considerations designed to provide a reasonable balance between proba- 
billty of error and number of branches. Ideally there would be just as many 
branches as there were characters and. no characters would be Confused. 

E was squared since the penalty for branches should be small until 
there are somewhat more branches than characters, but then the penalty 
should rapidly become severe. P was entered linearly Since there seemed 
to be fio good reason for another form. The weighting constant, c^, was 
selected so that the contribution from each term is the same when there are 
twice as many branches as characters and when the probability of error is 

1/lGj both conditions seeming equally had to me. 

2 

W = P (c^B) is an attractive alternative formula I have not explored. 
The Heuristics ; 

The program Itself was straightforward, A flowchart i-S given in 
figure EL Its goal was to take an existing partitioning and improve it 
by either adding or deleting a single atomic area. It was guided in this 
by four simple heuristics -- two were Specifically designed to reduce the 
number of branches, and two to reduce the probability of ertbr. Which 
pair was Cried first depended on the particular weaknesses of the existing 
tree, fiscal1 that the inspection routine returns not with just an overall 
meric factor! but rather with a branch number and a probability-of-error num¬ 
ber. If the contribution from the branches to the badness factor is greater 

2 

Chan that of the probability-o£-error, then 3 - cF^Oand the branch heuria* 

tics arc tried first. Otherwise the probability heuristics ate tried first. 

The first branch reduction heuristic is simply to remove from the 
shaded region an atomic area that has no coercion border with any other ip 
the shaded region except poaaibly the center* Intuitively this should re¬ 
duce the length of party vectors as it does for the p vector in figure ID. 




F'isurs 9* P^nFra^ fIp^cbiFt. 


try flanging -— f - < ' - — ehsnr’n r, 
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Figure Itk Branch heuristic l. 

The second branch reduceinn heuristic adds an atomic area that 
shares both its non-c enter- borders with areas already in the region. 

Note how this reduces the length of the L vector in figure ll 3 for example 




hz 1 


Figure 1L. Branch heuristic 2 + 


Since t educing the number of branches increases the liklinood of 
finding more than one character at a node* as one might expect t the 
probability-of-'error reducing heuristies complement the branch reducing 
heuristics. The first.probability heuristic involves adding a separated 
regioni the second involves dropping an interior region, $ee figures 
12 and li. 



C: 1 o 

U; 10 10 

Figure 12. Probability heuristic 1. 


Cl 1 0 
U: 1 0 
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Y: 1 v 1 
Ti 1 w 1 



Y: 1 in' L 
t: L 0 1 w 0 I 



Figure 13. probability heuristic t. 

Kote that if the first pair of heuristics tried fail, then 
the progrHsn tries adding or deleting the center region* Then, in des¬ 
peration* it enters the remaining pair of heuristics H l£ none of these 
work a it reports that it has failed. 
































RESULTS 


English Block Capital si 

The program wee used on the 26-letter English alphabet with en¬ 
couraging results. Starting with nothing in the shaded region, the 
program evolved the 2 -3-4-6-B region of figure 14. Only 63 branches 



Figure 14. Block capital region. 

were used o£ which 12 were w branches that cannot bear letters* The 1 
is confused with Z, and W with V* The program looped € tines and triad 
17 of the possible 512 partitionings before finding the beat it towId 
eorae up with. The region changes are indicated below. 






















first figure is B, number of branches „— 
record figure is P 3 probability of error 
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probability heuristic call ad 4 used 
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probability heuristic called + used 


(32 
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probability heuristic called + used 



branch heuristic called 4 used 


(65 



(54 


probability hear1stice called and not used 

center change did no good 

dcusporation branch heuristic us ad 



probability heuristic called + used 



25/26) 


16/26) 


10/26) 


7/26) 


4/26) 


5/26) 


4/26) 


2/26) 


failed 



























Greek Lower CgS^ 


Results were even better using the lower case Greet letters In 
the form seen in figure X5 k The final tree had 52 branches with only 
> and l sharing a node,. Again, the program was celled £ tines and again 
the first heuristic tried was used in 6 or the 7 successful attempts 
at improvement* only 12 toflfigdrations were ttied before the winner was 
found. The winning tree is shown in figure it. 
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Figure 15. 


alphabet. 
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VS X J-T 
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(0 23/24) 



probability heuristic celled + used 



(17 


probability heuristic celled + used 



probability heuristic called -J- used 



probability heuristic called + used 




branch heuristic called + used 



probability heurietice called 4 a:it used 
center change did no good 
desperation branch heuristic used 




probability heuristic called + used 



16/24) 


10/24) 


S/24) 


3/24) 


4/24) 


3/24) 


1/24) 


failed 
























A second experiment with the Creels letters was perforced to see if 
the program would home in on the 2-4-5-6-B region from another starting 
point. The 9~4-3 region was selected more or Lobe at random as an alter¬ 
nate starting point * and the pro gran, successfully reached the same end 
state, this time in 6 steps. Sso figure 17+ 



Figure 17. Sequence of areas reported 
in second Greek alphabet experiment. 

A third experiment started, from 1--9-5-7-2 but failed without 
arriving at the former solutions, Sea figure 16, 



Figure IB. Sequence of areas reported 
in third Greek aLphabet experiment. 
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CONCLUSION + EXTENSIONS 

Cane 1 aslans ; 

The program as it stands seemed to work quite well. In general, 
the first heuristic tried Succeeded in Improving the badness factor 
by reducing the branch or probability factor it was designed for.. 

Experiment a with the Greet alphabet indicate that final results do not 
depend stongly on the starting pattern — in two cases the same final pattern 
evolved, and In a third, the final pattern was different* but the badness 
factor was nearly the same, 

Further Work : 

It would be interesting to pump in more and more characters to see 

at what point saturation can be exhibited. That is, about how 

many characters are required before the program is incapable of reducing 

the probability ox error below some arbitrary figure, say 1/5. 

2 

The badness factor, W = E 4 c^!, worked surprisingly well.. The 
powers used, the constant c^ t Sind the general form were a bit arbitrary and 
might be improved. 

Generalizing the problem to the case in which a character is asset- 
Ht&d with several input sequences Would certainly require a more general 
function. 

If mote than one tree is allowed for recognition± the problem becomes 
more difficult* tj would have to apply to set# of trees* and heuristic* 
would have to be found to operate on sets of partitionings. 




